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In this paper, we present an approach for mining change in customer’s 
behavior for the purpose of maintaining robust profiling model over time. 
Most of previous studies leave important questions unanswered: 
In developing B2C e-commerce strategies, how do managers implicitly load 
customer’s profiles based on their satisfaction over the online store 
characteristics? And: What kind of feedback segments do they have?. Our 
proposed approach does not force customers to explicitly express their 
preference information over the online service but rather capture their 
preference from their online activities. The challenge does not only lay in 
analyzing how customer’s classifier model change and when it does so but 
also to adapt it to the customer’s click stream data using a new decision tree 
generation algorithm which takes as inputs new set of variables; categorical, 
continuous and fuzzy variables. Customer’s online reviews rates are 


Fuzzy logic considered as classes. Experiments show that this work performed well in 
Navigation session identifying relevant customer’s stream data to judge the chinese e-commerce 
Web tracking website “Tmall”. The extracted values of the website’s features are also 
useful to identifying the satisfaction level when the customer’s rate is 
not available. 
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1. INTRODUCTION 

The problem of load profiling and classifying customers has been studied extensively. Customer 
profiling approaches were used in e-commerce where systems keep a variety of customer information such as 
navigational and behavioral patterns using K-means algorithm for accurate profile construction [1], [2]. A. D. 
Rachid et al. [3] proposed a customer prediction model in an e-commerce context, wherein a clustering phase 
is based on the integration of k-means method and Length-Recency-Frequency-Monetary (LRFM) model. 
Girish S et al. [4] integrated classifier to predict the type of purchase that a customer would make, as well as 
the number of visits that he/she would make during a year. User profiles can also be created through multi- 
resolution clustering designed for smart metering data [5]. Smart meter technologies allow retailers to 
supervisor individual’s consumption in real time. Efforts have been made in mining those time series data for 
user profiling: Y. Wang et al. [6] used data gathered by Advanced Metering Infrastructure to better understand 
electrical consumption behavior, Q. Wang et al. [7] attempted to automate spike detection within large 
volume of smart meter data for load profiling and Y. Lu et al. [8] investigated the load profile clustering of 
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smart grid customers using an adaptive weighted fuzzy clustering algorithm. In [9] authors developed a 
classification system to identify a specific user among several users by investigating different metadata 
characterizing load profiles, including raw measurements, frequency characterizations and typical load shape 
indexes. A. E. Frosta et al. [10] tried to reduce the data to be processed during the customer’s load profile. 
Typical Daily Profiles (TDP) and Typical Weekly Profiles (TWP) are compared to see how the time 
resolution of data affects the clustering. In [11] the profile is acquired by asking users explicitly to introduce 
and update their profile manually and next monitoring the browsing behavior. A framework to integrate user 
profiling and customer journey is proposed by authors of [12] using user interviews and semantic analysis to 
identify the target audience. Marketing actions are then set up for customer journey and finally conversion 
stage to convert user into customer. 

Even though these researches reflect customer consumption behavior, it ignores data from 
customer’s click stream and online reviews rates. Moreover, though the research reflects various customer 
behavior data for creating customer profile and recommendation, judging and evaluating e-commerce 
website characteristics based on customer’s feedbacks is ignored. 

For business e-commerce website’s evaluation, P. Dong [13] established some site evaluation index: 
information content index, site survey index and technical index. T. Hariguna et al. [14] aimed to analyze 
empirically three factors antecedents of trust they are system quality, information quality, and service quality. 
The results of this study concluded that those had positive impact on customer intention to purchase in 
e-commerce transactions on social media. T. Singh [15] identified factors related to usability of e commerce 
website which are: user satisfaction simplicity, attractiveness, speed, efficiency, and searching product 
information. A survey used these factors as a valuable input from user for assessing the usability of 
e-commerce website. A framework was proposed by authors of [16] for evaluating the impact of four 
parameters on the success of e-commerce: customer satisfaction, costs, awareness & knowledge and 
infrastructure. Authors of [17] believed that all the success factors for supporting on-line shopping should be 
considered from multiple perspectives (technological perspective such as web design usability, social 
perspective such as social networks, etc.), across different e-commerce life cycle stages (pre-sale, 
information stage and supply chain). 

Various e-commerce website attributes aimed for data collection are reviewed keeping in mind the 
areas where sequence of web events generated by each user is required to assess the online service quality. 
Therefore, we discuss general customer’s shopping steps in online stores. H. Sergio et al. [18] analyzed 
sections visited by users, navigational paths followed when accessing specific pages of the website, and the 
relation between different web sections or the sections that lead users to buy products. To improve user 
fulfillment and shopping experience, it has become a general practice for online sellers to allow their users to 
review or to communicate opinions of the products that they have sold. The major goal of the paper [19] was 
to solve evaluation feature extraction problem and opinion classification problem from customers using 
feature words and opinion words from product reviews. G. Silahtaroglu et al. [20] collected data about 
customer’s mouse movements, their demographic information and items they added to their shopping 
baskets. A global view on the sequential online-shopping events was proposed by authors of [21] where data 
sets consist of two parts: a collection of click events and events attached to corresponding clicks. 
R. Purwaningsih et al. [22] pointed out that factors affect consumer interest in online shopping are: Perceived 
Concentration, Perceived Enjoyment and Perceived Ease of Use. 

Identifying these dimensions values implicitly based on e-customer click stream is our second task. 
The analysis of these data should be made to finally load the most accurate customer profile. In the next 
sections, we present details about the proposed customer profiling approach reflecting e-commerce website 
features, customer’s behavior and feedback, then we do mining on that dataset. And getting result in the form 
of Tmall website’s customer. Data are analyzed through a fuzzy approach to classify customers using a new 
decision tree algorithm by extending existing approaches [23], [24], [25], [26], [27] to profile generation, 
taking into account new set of evaluation variables required from customer’s feedback over the e-commerce 
website. 

The classification task is characterized by well-defined classes and a training set of pre classified 
examples. Where the data to be classified is static, a simple classifier might suffice. However, our proposed 
approach requires classifier that is resilient to dynamic data such us time-related and click stream data taking 
online customer satisfaction level as predefined classes. The major innovation of the proposed approach is 
based on decision tree induction to obtain useful knowledge from large amounts of e-customers data to 
construct online customer profiles. The purpose of our study is to fill this gap, by conducting a two phases of 
training and inference. Training phase involves inducing a decision tree ensemble from customer reviews and 
click stream data. Inference phase uses the induced decision tree ensemble from training phase to classify 
customer implicitly based on their click stream as test data. 

After providing an overview which treats together two different branches of research: online 
customer profiling techniques and features selection based on customer’s shopping steps for load Profiles, 
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details regarding our proposed approach are provided in Section 2, followed by a case study in Section 3 
conducted on a real user action data provided by Tmall, one of the largest B2C online retail platforms in 
China. The paper concludes in Section 4 with a summary of the work undertaken and direction of future 
works. 


2. PROPOSED LOAD PROFILING METHODOLOGY 
2.1. Framework for proposed model 
The proposed algorithm for customer profile generation has the following steps, Figure 1: 
Step1: Collect web data (click stream and online reviews data) from e-commerce website. 
Step2: Apply preprocessing techniques to remove noise from data. 
Step3: Divide the data into 2 parts: training data (Online Customer Reviews and Click Stream Data) and 
testing data (Click Stream Data). 
Step4: Apply appropriate data mining techniques to create the model. 
Step5: Train the model as per given rule-based classifier. 
Step6: If the model is not trained, go to stepS. 
Step7: Test the model using test data. 
Step8: If model is accepted, use it for online customer profiling. 


Click Stream Data = 
Online Customer = 
Reviews : 


Choose Data Mining 
Approach 


. 


Rule-Based || Decision 
Classifier Tree Nea 


Is model 
trained? 


Training 
Dataset 


Testing 
Dataset 


a 
Profiled Apply for 
Customer Profiling 


Figurel. An overview of the proposed methodology 


2.2. Metadata gathering and processing 

User Profiling can be defined as the process of identifying data about a user interest domain. This 
information can be used by the system to enhance the retrieval of meeting the user’s needs. This part presents 
the dynamic of the proposed model using a communication diagram showing the interactions between the 
various components of the system. The goal is to determine graphs of data that will be subject to appropriate 
optimization algorithm based on customer behavior targeting. 


2.2.1. Capturing dynamic customer’s behavior 

In Figure 2 above communication starts with 1: ts=consult_website() where ts is the website’s 
consultation time without discovering any product. 2*: find_products() is an iterative message which could 
be repeated some unspecified number of times. Customer searches description of products (2.1: search ():tp) 
with tp is the consultation time of product. If customer is interested in some product(s) he consult its 
description (2.2 [interested]: view_product():ps), ps refers to the consulted product(s). If client decides to buy, 
he add product to the basket -2.3 [decided to buy]: add_to_basket(). 3:Checkout() includes getting list of 
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products from the online basket (3.1 get_products():p’;), consulting bill (3.2: consult_bill():t’,), updating the 
shopping cart(3.3: update_shopping basket()) and purchasing (3.4[not empty (cart)]: make_order():p’s), with 
p’s refers to the purchased products. 


If the customer exceeds a 
threshold tO (ts>t0) Then 


‘Products 
Description 


there is a problem at the level 
of the web site's Efficiency 


‘ 2.1: search() ta 


1 consult we bsite() :ts—> 


y, 2: “find_products()—> 
=o : Ecommerce 


Website 


| 2.2: [interested]:view_products{) :ps—> | 
: Product 


3: checkout() —> 


Ecustomer Ng 2.3: [decided to buy}-add_to_basket() 


3.4: [not empty cari] make_order() :p"sy~ 


EN Ny 3.3: update_shopping_basket() 
i 3.1: get_products{)“p's 
\ 


‘Purchase 3.2: consult_bill() :t'p l \ 


‘ ‘Shopping Cart 


If p's-p"s ? 0 or no purchase was 


l } ‘If p's=0 and tp>t'0 then 
| ‘Bill the web site's Content 
————_ Clarity is weak 


made then the payment system 
availibility of the website is weak 


Figure2. Communication diagram of the dynamic customer profiling for online retailing assessment 


2.2.2. Candidate entries variables 
The starting and critical point for successful customer profiling based on navigational data is data 


preprocessing. The required high-level tasks are data cleaning, user identification, session identification, 
product view identification, etc. The proposed data preprocessing model for customer analysis tracks various 
online activities of customers to construct logical user sessions and create relevant entries variables for our 
proposed e-customer profiling system which take into account new set of variables; categorical, continuous 
and fuzzy variables for evaluating online service quality. 

After the customer’s order is shipped, we detect its feedback information to assign values to Order 
Fulfillment as a categorical variable and Delivery Time which is numeric one. According to the Figure 2, we 
have three other variables: Efficiency, Content and Payment System Availability which are considered as 


fuzzy attributes as shown in Table 4. 


2.2.3. Fuzzification 
Restrictions placed on fuzzy variable’s values to define the possible changes are based on 


e-customer behavior during navigation sessions:S = S = S, U Se U Sp U Sp respectively, are visit 
sessions, consultation, basket session and purchase session [28]. 


Ver = —— Zses((t;) W(s)+(s)) where: 


card(s) 
fe ift,<to f ifsES-S, "i if séS, 
9 (ts) F ase? YO 10 y Ses OOo sES-S, 


The value of -1 indicates that no consultation/basket placement/Purchase was made after the visit; efficiency 
is therefore “Very Weak”. The value of 1 indicates that there was a consultation/basket placement/Purchase, 


so the website’s efficiency is considered as “Very Strong”. 
For the Content variable, the value of 1 indicates that there was a basket placement after the consultation of 


product’s information. The value of -1 indicates that there is no basket placement: 
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—~V sese (—— Yer, o'(tp) W(p)+@ P) where: 


V content = card(Sc) card(Ps) 

i 1 ift, <t ; 
VOL et. vo- 

1 ifp EP's 2. -{ if p E P's 
? y eerie © Sy ifp E€ B -P, 


The Payment of a command is very weak if no purchase was made or product(s) added to the online basket 
are deleted. The value of 1 indicates that there was a successful purchase: 


1 1 
Vpayment = Zo T A p" (tp) Y") + D" (p) 


EP's; 


Membership function (MF) is defined for finding membership value for each of the fuzzy inputs to 
five levels: very weak, weak, medium, strong and very strong. Figure 3 shows membership function that we 
use in our method: 


Ww w 


vw W M S VS y M S VS WwW M S VS 
e -y 0 Yy, 1 7a 0 y 1 -l “yO yy 1 


-l 


(a) (b) (c) 


Figure 3. MFs used to present the linguistic labels of: (a) Efficiency (b) Content (c) Payment System 
Availability 


2.3. Proposed data mining approach 

The proposed approach consists of two phases of training and inference as outlined in Figure 4. In 
training phase, online customer’s reviews and click stream data are processed and a decision tree ensemble 
derived for change mining. The inference phase involves using the classifier derived from the training phase 
to dynamic profiling of online customers. The decision tree generation algorithm mentioned above present 
our proposed decision tree feature selection strategy adapted to the evolving data. 


Training Phase 


Inference Phase 


Stepl: Step 4: 

ao p 4: 
Define training data set | Online ane Interactions tracking Testing 
And class labels Filtered By Stars wae omy for Customer to classify Behavioral Data 

a 
Step2: 7 Processed Data || Step 5: 
Preprocess taining data ù Preprocess data Processed Data 
Before applying for training Before applying for testing { 
Step3: E Step 6: 
Derive Decision Tree Use derived decision tree 
to classify customers V 
Final class +——— 


Figure 4. Change mining of customer profile based on navigational data 
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2.3.1. Learning phase 
Given training set T containing n samples belonging to k classes {c1,c2,...,ck} (1< k< p). Let Abe 


the attribute that may influence consumer’s satisfaction, its value set {x wei ah, ree v is the number of 
values. The dissimilarity dis(xy, xy) between any two Agvalues for the class C, can be computed by: 


| Xu-Xvl š . i 
| Max(A,)-Min(Aq) ifAgis numeric 
dis(Xy,Xy) = Oor1 if Agis categorical 
d(Xy, Xy) if A, is fuzzy 


1/2, ; i 
d(xy,Xy) = (i1 Qui — Xyj) 2) i is a euclidean distance where x,y; and x,jrepresent the vector of 
belonging to fuzzy objects {a wi dj, ony as} which are represented by functions mapping the fuzzy attribute 
scale (Figure 3), Xy = Xu1, Xuz + Xus 


The average similarity of A, for the class Cẹ can be computed by equation expressed by sim (Age, ): 


1 


2*Dxy eC XvECk LIG ENT 
. = ? |1+dis(xu.xv)| 
sim (A ) € [0,1 
ack lCkl*(ICkl—1) [94] 


[Cx] is the size of Cp. If |C,| =1, then sim (Age, )=1- 


The global similarity of the attribute Ag is the sum of the similarity of each subset expressed by SIM( Ag), 
which can be defined as: 


SIM(Aq) = ZP_, (Pk *sim(Agc,)) € [0.1] 
The Set of class values is {C,, C2,..., Ck}, their probability is {P,, Py,..., Pk}. 


Node_Selection (T, Aq): 

For each A, in attribute_list do 

SIM(A,)=0 

For each class C;, do 

SIM (Aq) = SIM (Aq) + sim(Agc,)* Px 

End for 

End for 

Return the attribute Agwith the maximum similarity 
Calculate Splitting Threshold (T, Ag ): 

Sort the dataset T by numeric attribute’s values 
max=0 

Aq Is the numeric attribute 

For each value xy, 

Calculate the average similarity of Ag for two subsets T,,that divided by x, 
Ifmax < SIM(A,) 

max = SIM(A,) 

Splitting Value= x where SIM (A,) is maximum 
End if 

End for 

Return x, 


Figure 5 shows provides a better understandin of decision tree induction process in data stream 
scenario, particularly focusing on the mathematical foundations of choosing the root and splitting continuous, 
categorical and fuzzy criteria in decision tree nodes. 
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Generate_Decision_Tree (T,Ag, Y) 
Inputs: Training data: T, set of continuous, categorical and fuzzy valued attributes: A,, Target attribute: Y 
Output: A decision tree 
Start 
If T is empty Then 
Finish the function without building a node 
End if 
If T is made only for the values of the same target Then 
Return a single node of this value 
End if 
If A, is empty Then 
Return a single node with value as the most 
common value of the target attribute values in T 
End if 
D < Node Selection (T,A,) 
//D is the root of the tree 
If D is continuous type Then 
Xy<—-Calculate splitting threshold (T, A,) 
End if 
//Grow the tree and partition the data using the continuous split 
{X1,X2,... Xy} < Values of the attribute D 
Return the constructed tree whose root is D and the arcs are labeled by x,,x3,... Xy 
Generate_Decision_Tree for sub-trees (T,, Ag-{D}, Y), (T2, Ag-{D}.Y), -., (Ty, Ag-tD}, Y) 
End 


Figure 5. Pseudo code of decision tree induction algorithm 


2.3.2. Classification of a new point 

To classify a new input point, we simply traverse down the tree. At each node in the decision tree, 
we ask a question about our data point. For categorical and numeric attributes, if the condition at the node is 
met, we go left; if it is not true, then we go right. For fuzzy criteria we calculate the value of 
d(x,,, Xy) translating the membership possibility of the new input point x,, to each fuzzy subspace x, in the 
tree. The best mark d(xy,Xy) in a subspace is the mark with the minimum value; it identifies the fuzzy split 
value which responds the condition at the node in question. 


3. VALIDATION METHODOLOGY AND RESULTS 
3.1. Dataset 

We choose reviews and click stream data from TMALL as the data source. TMALL is an important 
business unit of Alibaba Group which is known as the top one B2C platform in China. The user’s behavior of 
browsing TMALL reflects their preference of items. This data set contains 25432915908 records of user-item 
interactions. Features of each row are listed in Table 1. Table 2 presents review data for partial "user-item" 
pairs, which contains the review and rating on the item/merchant/logistic. This data set contains 241919749 
rows, corresponding to 241919749 reviews. 

Table 4 describes used training data where class is deduced from online reviews which are filtered 
by 5 starts referring to customer satisfaction rate as mentioned in Table 3. For each class of each customer, 
restrictions can be placed on website’s features values to define their possible changes based on e-customer 
behavior during navigation sessions as mentioned in Fuzzy Rules Design section. These features will be 
classified into 5 linguistic terms using click stream data into membership function values mentioned in 
Figure 3 e.g. (0.0/VW Very Weak+0.0/W Weak+0.5M/Medium+0.5/S Strong+0/VS Very Strong). 
Customers who don’t describe their feedback were chosen as testing set a shown in Table 5. 
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Table 1. User-Item Interaction 


Item Definition 
Used_id A string as "09774184", denoting an unique user 
Item_id An integer in[1, 8133507], denoting an unique item 
Action Type of behavior, a string like "click", "collect", "cart", "alipay", represents for ' click' ,' add 
to favorite’ , ' add to cart' and ' purchase' , respectively 
Vtime Timestamp of user’s behavior 


Table 2. User-Reviews 


Item Definition 
Used_id A string as "09774184", denoting an unique user 
Item_id An integer in[1, 8133507], denoting an unique item 
Feedback A string containing multiple key words, separated by'' . There words are extracted 
from the raw title by an NLP system 
Rate_pic_url An URL linked to corresponding image online 
Gmt_create Timestamp of the review, A string as "yyyy-mm-dd hh:mm:ss" 


Table 3. Tmall Data Sampling 


User Id Item Id User Action V-Time Feedback 
u41 i161 Click 30/09/2014 15:19:00 *k*kx*x*xx* 
i534 Alipay 03/09/2014 15:46:01 
u57 i135 Cart 26/09/2014 20:32:01 kke*k 
u1209 i110 Alipay 01/09/2014 22:28:29 kok*k 
ul641 i109 Alipay 02/09/2014 11:50:52 kok 


Table 4. Descriptive Example of Used Training Data 
Payment System 


Efficiency Content 


Availability Order Deli- 
User-Id Class v v y y Fulfillment very 
VYS S MWygs SMW y YS S MWy Time 
u48 Satisfied 05s o5 0 0 0 001 0 0 06 04 0 0 o Sele 3 
Promise 
wai Vey. 1 0 000001 0 0 ı 0o 0 o o TO eg 
Satisfied Promise 
ul725 Moderately 3 o o o0o0o01o0 0o 0o 07 03 0 o o ®iable-To- jo 
Satisfied Promise 
u2502 Moderately 3 o oo0oo0o1o0o0o 0o 0o 0o o ı o o Aable-To- 70 
Satisfied Promise 
Not Not 
u394 93 x 08 02 0 0 0 0 1 0 0 0 0 0 1 0 0 Available-To- 20 
Satisfied ` 
Promise 
uso VOY, 1 0 000000 05 05 1 0o 0o o o Ahable-To 6 
Satisfied Promise 
aso Moderately o6 o4 o 0 0 010 0 0o ı 0o 0o o o eo 8 
Satisfied Promise 
Somewhat Not 
ul675 Ne R 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 Available-To- 10 
Satisfied i 
Promise 
Table 5. Tmall Testing Data 
User Id Item Id User Action V-Time 
u39 i146 Click 03/09/2014 21:01 
153 Cart 29/09/2014 11:17 
1366 Alipay 04/09/2014 19:23 
134 Click 06/09/2014 20:24 
u3286 i152 Collect 27/09/2014 0:28 
u2984 141 Cart 24/09/2014 21:36 
1381 Click 18/09/2014 11:22 


3.2. Programming setup 

The deployment procedure is performed on a system among Intel Core i7, 8GB memory, along with 
Windows 7 system. Here, the method implemented in JAVA using Eclipse neon.3 with CSV Files. The 
Proposed algorithm is calculated with numerous kinds of customer’s dataset to estimate the effectiveness of 
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proposed approach.We ran online TMALL customer’s reviews and click stream data through step 1 and step 
2 of the process outlined in Figure 4. Data shown in Table 4 were randomly chosen as training set. This data 
set contains rows corresponding to records of user-item interactions and reviews, considering customer 
satisfaction degree as target attribute and TMALL website’s characteristics as non-target attributes. 

The simulation results are summarized in Figure 6. The algorithm takes as output a tree that 
resembles to an orientation diagram where each end node (leaf) is a decision (a class) and each non-final 
node (internal) represents a test. At each node in the decision tree, we ask a question about our in store 
navigational data point considered as input data as shown Table 5. 


Z3 [Order Fulfillment] 
? C Available To Promise : [Efficiency] 
? C Strong {0_0, 0.0, 0.0, 1.0, 0.0} : [Payment System Availability] 
? C very Strong {0.0, 0.0, 0.0, 05333333, 0.4666667} : [Content] 
? Every Strong {0.0, 0.0, 0.0, 0.57142854, 0.42857146} : [Delivery Time] 
[A tess than 10.0: [ Class = Satisfied] 
? GE Available To Promise : [Delivery Time] 
$ C less than 18.0: [Efficiency] 
C Medium {0.0, 0.0, 1.0, 0.0, 0.0} : [Payment System Availability] 
? C Medium {0.0, 0.0, 1.0, 0.0, 0.0} : [Content] 


[LA Strong {0.0, 0.0, 0.1666667, 0.8333333, 0.0}: [ Class = Moderately Satisfied] 
? C Strong {0.0, 0.0, 0.19999999, 0.8, 0.0} : [Content] 
[A Medium {0.0, 0.0, 1.0, 0.0, 0.03: [ Class = Moderately Satisfied] 


?@ cA Weak {0.6, 0.39999998. 0.0, 0.0, 0.0} : [Content] 
?- c Medium {0.0, 0.0, 1.0, 0.0, 0.0} : [Payment System Availability] 
ies | Weak {0.4666667, 0.5333333, 0.0, 0.0, 0.0}: [ Class = Satisfied] 
? E Strong {0.0, 0.0, 0.0, 1.0, 0.0} : [Content] 
? EJ very Strong {0._0, 0.0, 0.0, 0.42857146, 0.57142854} : [Payment System Availability] 
[i very Strong {0.0, 0.0, 0.0, 0.6666667, 0.33333334}: [ Class = Satisfied] 
? C ver Strong {0.0, 0.0, 0.0, 0.0, 1.0} : [Payment System Availability] 
? G4 very Strong {0.0, 0.0, 0.0, 0.4, 0.59999996} : [Content] 
re Strong {0.0, 0.0, 0.0, 1.0, 0.0}: [ Class = Satisfied] 
a very Strong {0.0, 0.0, 0.0, 0.71428573, 0.28571427}: [ Class = Very Satisfied] 
e C NotAvailable To Promise : [Content] 
? C Medium {0.0, 0.0, 1.0, 0.0, 0.0}: [Efficiency] 
? Eo very Weak {1.0, 0.0, 0.0, 0.0, 0.0} : [Payment System Availability] 
? EI Weak {0.33333334, 0.6666667, 0.0, 0.0, 0.0}: [Delivery Time] 
G less than 24.0: [ Class = Not Satisfied] 


Figure 6. A part of induced decision rules 


3.3. Performance and discussion 

Our methodology enhances e-commerce website’s feature extraction and customer’s opinion 
classification. As the results show in Figure 6, the tests carried on numeric, categorical and fuzzy features 
show that the proposed algorithm select appropriate threshold for stopping growth respecting the three types 
of features and gives good classification rates for the shortest computing times. Feature selection reduces the 
dimensions of problem, but also improves customer’s classification performance by discarding noise, 
redundant, and unimportant features. 

First, references such us [29] were analyzed for click stream data usage, they didn't involve 
fuzzification of those data based on online customer navigation sessions. Moreover in the area of value 
creation in an e-business model, such an approach also didn't attempt to create value for customers based on 
their navigational pathways. Second, comparing reference [30] selection features of success in e-business, 
our research fuzzy mine click stream and use it as a key to select and judge features properly characterizing 
the global online store performance. Third, several references like [19] conducted research for online 
products which are relevant in terms of having high sales, while this paper research for ensure online 
customer’s satisfaction. Using the proposed tree induction technique, marketing rules can be generated to 
match customer to satisfaction categories. For [31], classification decision tree algorithm has an input 
training dataset which consists of a number of attributes which are either categorical or continuous. Dataset 
used in the current paper is from the public information in Tmall. More types of data are available in our 
approach for learning decision tree classifier which can handle categorical, discrete, continuous and fuzzy 
attributes. 


4. CONCLUSIONS AND FUTUR WORK 
Modeling the user’s interests is a challenging task. Our work draw a line between the actual user 
interest and the acquired user profile by inducing a decision tree ensemble from customer’s behavior and 
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classifying customers based on their navigation sessions and reviews over the online service that may be 

used. The case study shows that by fitting the model variables and variable’s restrictions, and by using our 

classification model, an accurate customer profile can be achieved with regards to assessment of e-service 
quality. As future works, our approach can be could be enhanced and extended in several directions: 

a. Whereas explicit customer’s click data is collected for statistical analysis, data for textual customer’s 
reviews need a pre-processing process. We can attempt to explore how text mining could be applied to 
mine and summarize customer’s reviews. This deserves more study. 

b. Compare the proposed customer profiling approach to our proposed multi criteria classification model 
[32] to show promising results for classification techniques adoption. 

c. People tend to take experienced customer’s opinion before making their own purchase. Thus, an implicit 
user profiling through social information discovery can be proposed to adequately capture the user’s 
interests. 
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